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The threshold behaviour of the K-Satisfiability problem 
is studied in the framework of the statistical mechanics of 
random diluted systems. We find that at the transition the 
entropy is finite and hence that the transition itself is due to 
the abrupt appearance of logical contradictions in all solutions 
and not to the progressive decreasing of the number of these 
solutions down to zero. A physical interpretation is given for 
the different cases K = 1, K = 2 and K > 3. 

PACS Numbers : 05.20 - 64.60 



The observation of critical behavior in randomly gen- 
erated combinatorial structures - in mathematical and 
computer science problems as well as in physical and bi- 
ological models - has recently focused revived interest 
in the Satisfiability (SAT) problem for randomly gener- 
ated Boolean formulas as prototype of NP-complete j^] 
problems exhibiting threshold behaviour and intractabil- 
ity concentration phenomena ^-^|. The SAT problem 
is the root problem of complexity theory 13] and con- 
sists in determining the existence of an assignment of 
Boolean variables {xi = 0, l}i=i,. ..,jv that evaluates to 
true a generic conjunction (logical AND operation) of a 
set of clauses. Each clause {C^}/ = i jvf is in turn de- 
fined as the logical disjunction (logical OR operation) of 
a subset of literals, which are either the variables Xi or 
their negations Xi. The overall Boolean formula evalu- 
ates to true if and only if all clauses are simultaneously 
satisfied, i.e. iff at least one among the literals in each 
clause takes the Boolean value 1. K-SAT is a version of 
SAT in which each clause contains a random set of ex- 
actly K literals. When the number of clauses becomes 
of the same order as the number of variables (M = aN) 
and in the large N limit (indeed the case of interest also 
in the fields of computer science and artificial intelligence 
HQ), K-SAT exhibits threshold phenomena. Numerical 
simulations show that the probability of finding a correct 
Boolean assignment falls abruptly from one down to zero 
when a crosses a critical value a c (K) of the number of 
clauses per variable. Above a c (K), all clauses cannot be 
satisfied any longer and one would rather minimize the 
number of unsatisfiable clauses, which is the optimization 
version of K-SAT also referred to as the MAX-K-SAT. 

This scenario has been proven to be true in the K = 2 
case. The mapping of 2-SAT on directed graph the- 
ory H indeed allows to derive rigorously the threshold 
value a c — 1 and an explicit 2-SAT polynomial algorithm 



working for a < a c has been developed |p| (whereas, for 
a > 1, MAX-2-SAT is a NP-complete 

For K > 3, much less is known and not only MAX-K- 
SAT but also K-SAT belongs to the NP-complete class 
Some bounds on a c (K) have been derived and 
a remarkable application of finite size scaling techniques 
has recently allowed to find precise numerical values of 
a c for K = 3,4, 5,6 ||]. An important rigorous result is 
the self-averageness taking place in MAX-K-SAT : in- 
dependently of the particular sample of M clauses, the 
minimal fraction of violated clauses is narrowly peaked 
around its mean value when N — > oo at fixed a [[To). 
The situation becomes easier to understand in the large 
K limit where a simple probabilistic argument give the 
asymptotic expression of a c (K) ~ 2 K In 2 [Q. 

The purpose of this letter is to study the K-SAT prob- 
lem using concepts and techniques of statistical mechan- 
ics of random diluted systems. To do so, we map K- 
SAT onto an energy-cost function by the introduction of 
spin variables Si = ±1 (a simple shift of the Boolean 
variables) and of a quenched (unbiased) random matrix 
C(,i = 1 (respectively —1) if Xi (resp. Xi) belongs to the 
clause CV, otherwise. Then the function 
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where 8[i;j] denotes the Kronecker symbol, equals the 
number of violated clauses and therefore its ground state 
(GS) properties describe the transition from K-SAT 
(E GS = 0) to MAX-K-SAT (E GS > 0). 

While previous works on the statistical mechanics of 
other combinatorial optimization problems - such as 
Traveling Salesman, Graph Partitioning or Matching 
problems HP - focused mainly on the study of the typ- 
ical cost of optimal configurations, the issues arising in 
K-SAT are of different nature. Below a c , the ground 
state energy vanishes and the key quantity to be ana- 
lyzed is the typical number of existing solutions, i.e. the 
ground state entropy Sgs> for which no exact results are 
available so far. Our main result is that Sqs is still exten- 
sive at a = a c : the transition is not due to a progressive 
reduction of the number of solutions but to the sudden 
appearance of logical contradictions in "all" of the expo- 
nentially numerous solutions at the threshold. 

In order to regularize the model, we compute the par- 
tition function 



Z[C]= Y, exp(-0E[C,S\) 



(2) 



{Si=±l} 
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after having introduced a finite "temperature" 1/(3. 
The typical ground state free energy \nZ[C] = 
lim n ^ (Z[C] n — where (. . .) stands for the average 
over the random clauses, is then recovered in the limit 
of zero temperature (3 — > oo. The amount of technical 
work necessary to perform the computation does not al- 
low us to display all details. We then limit ourselves to a 
general description of the methodological steps and 
focus mainly in the discussion of the results. The com- 
plete calculation will be given in a forthcoming paper 
& 

Once we have introduced n replicas Sf, a = 1 . . . n, 
of the system, the average over the disorder C couples 
all replicas together through the overlaps Q°i>---> a 2r = 
jj J2iLi ^i 1 ■ ■ ■ "S? 2r an d their conjugated Lagrange pa- 
rameters Q»i»-,»ar ( r = i ; ... ;?1 /2) g. The resulting 
effective Hamiltonian N H[{Q}, {Q}] involves all multi- 
replicas overlaps as expected in diluted spin-glasses 
P, fil]]l(| and is therefore much more complicated than 
long-range disordered models where only interactions 
between pairs of replicas appear [0]. The free-energy 
is evaluated by taking the saddle-point of H over all 
overlaps Q,Q. This highly difficult task may be sim- 
plified by noticing that, due to the indistinguishability 
of the n replicas, the effective Hamiltonian TL must be 
invariant under any permutation of the replicas. There- 
fore, one is allowed to look for a solution such that the 

overlaps only depend upon the number of coupled repli- 
cas : Q«i»-.«»ar = Q r7 Qa 1: ...,a 2r = g|] Thig ^ 

the so-called Replica Symmetric (RS) Ansatz we shall 
use hereafter. Moreover, it results convenient to char- 
acterize all Q r by introducing a probability distribution 
P(x) of the Boolean magnetization x = (H), such that 
Q r = j dxP(x)x 2r . Elimination of the Lagrange pa- 
rameters QrS leads to the expression 

]^Z\C] = log2 - i J dxP(x) log(l - x 2 ) + 

-1 K 

a(l-K) / J] 

dxiPixi) log A(v\ + 

aK f 1 K ~ l 

— Y[ dx i p ( x t)l°E A (K-i) , (3) 
J 1 i=i 



with A (J) ee A (J) [{x e }, 13] = 1 + [e-P - 1) ntiC 1 + ^)/ 2 
for J = K — 1 and J = K. The measure P(x) is given 
by the saddle point integral equation 



P(x) = 



1-x 2 



du cos 
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J - x i=i XZ 
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A toy version of the K-SAT problem is obtained when 
K = 1. As we shall see, some interesting informa- 



tion may already be obtained from this almost trivial 
case. A sample of M clauses is completely described by 
the set of integer numbers {tj, /j}, i = 1, . . . , N, where 
ti (respectively /,) is the number of clauses imposing 
that Si must be true (resp. false). The partition func- 
tion (0) corresponding to this sample reads Z[{ti, /;}] = 
n i =i(e" /3tl + e-M'). Avera ging over the probability 
weight M\/Y[*L 1 (t i \f i \)/{2N) M of the sample, we find 
ifnZ = In 2 - a/3/2 + YZ-oo e' a h{a) ln(cosh(/3Z/2)) 
where /; denotes the I th modified Bessel function. In the 
limit of zero temperature, the energy and the entropy 
of the ground state read Ecs{&) = a [l — e~ a Io(a) — 
e~ a I 1 {a)]/2 and Scsi®) — e^ a I (a) In 2 respectively. 
Therefore, as soon as a is non zero, the clauses can- 
not be satisfied all together but there is an exponentially 
large number of different values for the Boolean variables 
giving the same minimum fraction Egs{o) / a of unsast- 
isfiable clauses. The reason is that, though all Boolean 
variable are required to be true {U > 0) and false (/* > 0) 
at the same time, a finite fraction of them, e~ Q /o(aO, ful - 
fill the condition ti — /j. The latter can therefore be 
chosen at our convenience, without changing the ground 
state energy. These results may be found back within 
our approach, showing the RS Ansatz is exact for all /? 
and a when K = 1. The saddle-point equation for P(x) 
can be explicitly solved at any temperature 1 / (3 and the 
solution read 



P{x)= e- tt ^(a)5(.T-tanh(^ 



(5) 



In the limit of physical interest (3 — > oo, P(x) reduces 
to a sum of three Dirac peaks in x = ±1 and with 
weights (1 — er a I Q (a))/2 and e~ a I (a) respectively. It 
clearly appears that the finite value of the ground state 
entropy is due to the presence of unfrozen spins, resulting 
from the mechanism exposed above. This is an important 
feature of the problem which remains valid for any K . 

Another relevant mechanism is the accumulation of 
magnetizations (H) = ±(1 — 0(e~^ z ^)), z = 0(1), giv- 
ing two Dirac peaks contributions to P{x) in x = ±1 in 
the zero temperature limit, as can be seen from (||). The 
occurrence of such peaks means that a finite fraction of 
spins are frozen and hence that a further increase of a 
beyond a c would cause the appearance of unsatisfiable 
clauses. 

This scenario remains valid for any K. The frac- 
tion of violated clauses at temperature 1/(3 can indeed 
be computed through E = —jjd\nZ/d/3. The ground 
state energy will clearly depend only upon the mag- 
netizations of order ±(1 — 0(e~^ z ^ /3 )), if any. These 
contributions can be picked up by the new function 
R(z) = lim ^ oo /3P(tanh(/3z/2))/2/cosh 2 (/3z/2) satisfy- 
ing the saddle-point equation 



R(z) = 



du . . 
— cosluz exp 
2tt v ' 
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\ Y[ dz e R(z e ) cos(w min(l, z u . . . , z K _i)) 
Jo 
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Remarkably, it is possible to find analytically an exact 
solution to this functional equation for any K and a : 



R(z)= £ e^I £ ( 7 )S(z-£) 



where 7 is solution of the implicit equation 

K-l 

7 = ctK 



l-e-T/ (7) 



(7) 



(8) 



The corresponding cost function equals Eqs(ol) = 7(1 — 
e~ 7 /o(7) - Ke-~ / I 1 (~f))/2/K. We shall now analyse the 
physical structure of this solution and show how the pre- 
dictions it leads to for K = 2 qualitatively differ with 
respect to the case K > 3. 

Self-consistency equation (||) admits the solution 7 = 
for any a 14|. When K — 2, there is another solution 
7(a) > above a = 1. This new solution maximizes 
Eqs (and then the free-energy) and must be preferred 
||. Therefore, our RS theory predicts that Eqs — for 
a < 1 and increases continuously when a > f, giving 
back the rigorous result a c (2) = f . The transition taking 
place at a c is of second order with respect to the order 
parameter : the value of 7 does not show any jump and 
two Dirac peaks for P{x) progressively appears in x = ±1 
with amplitude (1 — e~ 7 io(7))/2 each. For large a, the 
RS ground state energy scales as Egs — a/4 which is 
known to be exact || . As far as 2-SAT is concerned, the 
value of the threshold is correctly predicted and one can 
reasonably assume that the RS solution is valid below 
a = a c = 1. Above a c , further analysis is needed [fl3| in 
order to discuss the exactness of the RS solution. 

For a > a c , there do not exist anymore sets of Si's such 
that the energy (Q) remains non extensive. The vanish- 
ing of the exponentially large number of solutions that 
were present below the threshold is surprisingly abrupt. 
We have indeed studied the number of such solutions as 
a function of the number of clauses per spin in the range 
< a < a c . Their logarithm (divided by N), that is 
the entropy of the ground state Sgs(<*)> ^ s gi ven by (||) 
when (3 — > 00. Finding the solution P(x) of the implicit 
function equation (jj) is a difficult numerical task. We 
have therefore resorted to an exact expansion of P[x) in 
powers of a, starting from P(x)\ a =o — S(x), and injected 
the resulting probability function into ^ to obtain the 
expansion of Sqs («) ■ At the 8-th order (which implies an 
uncertainty less than one percent), we have found that 
Sgs(&c) — 0.38, which is still very high as compared 
to <Sgs(0) — In 2 (see fig. 1). It is remarkable that the 
entropy does not vanish at the transition but keeps an ex- 
tensive value just below the threshold. The transition is 
therefore due to the abrupt appearance of contradictory 



logical loops in "all" solutions at a — a c and not to the 
progressive decreasing of the number of these solutions 
down to zero at the threshold. 

Let us turn now to the K > 3 case. Resolution of 
implicit equation (0) leads to the following picture. For 
a < a m (K), there exists the solution 7 = only. At 
a m (K), a non zero solution 7(a) discontinuously ap- 
pears. The corresponding ground state energy is negative 
in the range a m (K) < a < a s {K), meaning that the new 
solution is metastable and that Eqs = up to a s (K). 
For a > a s (K) the 7(a) 7^ solution becomes thermo- 
dynamically stable, leading to the conclusion that a s (K) 
corresponds to the desired threshold a c (K). However, 
this prediction is wrong as can be immediately seen for 
K ~ 3, since the experimental value a c (3) = 4.17 ± 0.05 
§ is lower than or m (3) ~ 4.667 and a a (3) ~ 5. 181. In 
addition, large K evaluations give a m (K) ~ K2 K /16/ir 
and a s (K) ~ K2 K /4/ir, which grow faster than the 
asymptotic value a c (K) ~ 2 K \n2 fl5|| . 

This situation, strongly reminiscent of neural networks 
with binary couplings |jl2|| , may be understood by an 
inspection of the RS ground state entropy. To do so, 
we have expanded Sqs to the I th order in a using the 
same method as for K = 2 and denoted by ail{K) 
the point where it vanishes. Note that aJ~J(K) corre- 
sponds to the annealed theory while a{ e }(K) converges 
to a ze {K) when i — > 00, that is the exact value of a at 
which Sqs goes to zero. For K = 3, we have performed 



the expansion up to £ = 8 and found 
af} = 5.0144, af} = 4.9189, = 
4.8187, af} = 4.7893, a { JJ = 4.7677, a{ s e ] 



,.(!) 



= 5.1909, 



4.8589, af} 



4.7504, in- 
dicating that a ze is definitively larger that a c ~ 4.17. 
Repeating the calculation for K = 4,5,6, we have ob- 
tained qualitatively similar results which show an even 
quicker convergence towards a zero entropy point such 
that a c {K) < a ze (K) < a s (K). Finally, in the large 
K limit, a ze (K) asymptotically reaches the threshold 
a c {K) from above. As K grows, fluctuations get weaker 
and weaker and Gaussian RS theory becomes exact [jl6| . 
Solving equation (0), we find for K 3> 1 and a < a c (K) 



P(x) 



1 



y/2nu(a)(l -x 2 ) 



exp 
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(9) 



where u(a) = aK/A K ^ 1 . As a consequence, P(x) — * 5(x) 
when K — > 00 tells us that the annealed theory becomes 
exact in the large K limit. 

Therefore, the situation is as follows for (finite) K > 3. 
Above a ze , the RS entropy is negative whereas it has to 
be the logarithm of an integer number. The RS Ansatz 
is clearly unphysical in this range, explaining why a s 
and a c do not coincide. At the threshold a c (which is 
experimentally known to be lower than a ze ), the RS en- 
tropy is still extensive. The crucial question now arises 
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whether this result is exact or is affected by Replica Sym- 
metry Breaking (RSB) effects. To clear up this dilemma, 
an analysis of RSB effects would be required. Due to 
the general complexity of such an approach in diluted 
models |llj and the technical difficulty of the K-SAT 
problem, the preliminary attempts we have done in this 
direction have not been successful yet |l3| . We have then 
resorted to exhaustive numerical simulation in the range 
N = 12, ...,28 and compared the corresponding ground 
state entropies S^fJ(a) to our RS theory for K=3. As re- 
ported in Fig.l, for a < a c , our analytical solution agrees 
very well with the numerical results. This confirms that 
the entropy of the ground state is finite at the threshold. 
The comparison may be made more precise by a careful 
extrapolation of the entropy Sqs(cx ~ 4.17) in l/N (see 
inset of Fig.l). The extrapolated value appears to be 
in perfect agreement with the RS prediction Sgs — 0.1. 
Therefore, RSB corrections to the RS theory seem to be 
absent below a c , which leads us to think that RSB could 
occur at a c exactly. 

To conclude, let us say that one should however not 
deduce from the above remark that the structure of the 
solution-space is simple. It might well happen that the 
solution-space could have a non trivial structure which is 
not reflected by the magnetization distribution P{x) only 
. It would be very interesting to understand if such a 
phenomenon could take place in the K-SAT problem and 
what information the hidden structure of the solution- 
space could give us about its algorithmic complexity. 



FIG. 1. Entropy vs. a for K — 2 and 3. The analytical so- 
lutions (solid lines) are compared with numerical exhaustive 
simulations for N = 12, 16, 20, 24 and 30000, 15000, 7500, 2500 
samples respectively (for K = 2 we stop at a — 2.5 whereas 
for K = 3 at a = 6). Error bars are within 10% and thus 
not reported explicitly. Inset: l/N entropy extrapolation 
for a = 4.17 (in average for each N), N = 20,22,24,26,28 
(with 16500, 11500, 7500, 4000, 3000 samples respectively) and 
K = 3. 
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